CITE-seq dataset

In this tutorial, we use a public CITE-seq dataset to illustrate Joint analysis using LinQ-View. Data could be download from NCBI: RNA and ADT.

More details about this dataset can be found on Seurat Website

Step 3 Pre-process

Users are allowed to use either original Seurat functions or our functions for pre-process steps

1) Filter out unwanted cells (optional)

for this dataset, we don’t need to filter out unwanted cells

2) Remove unwanted genes (optional)

for this dataset, we don’t need to filter out unwanted genes

3) Normalization

data Normalization for both ADT (CLR) and RNA (log)

4) Indentify HVGs for RNA data

Call seurat function to identify highly variable genes (HVG) for RNA data

5) Data scaling

Scale data for both ADT and RNA

Step 4 Linear dimension reduction (PCA)

directly call Seurat function for linear dimension reduction (PCA)

Step 5 Determine number of PCs

call Seurat function JackStraw to determine number of PCs

Step 6 Distance calculation and joint distance calculation

calculate cell-cell distances for RNA, ADT and joint. number of PC was set to 20 by default.

Step 7 Non-linear dimension reduction (UMAP and t-SNE)

run UMAP as Non-linear dimension reduction for RNA, ADT and joint analysis.

Step 9 Visualization ADT vs RNA vs Joint

As indicated by red circle, joint analysis identified two distinct NK cell subsets: CD8+ NK and CD8- NK. These two NK subsets also can be identified by using ADT information only. Heatmap below shows the distinct curface protein pattern of these two NK subsets (joint cluster 5 and 6). These two subsets can not be distinguished using RNA only because they have identical transcriptional expression.

As indicated by blue circle, joint analysis identified two distinct CD4 T cell subsets: Naive CD4T and Memory CD4 T. These two CD4 T cell subsets also can be identified by using RNA information only, but can not be distinguished using ADT only because they have similar cell curface protein pattern.